{
  "cells": [
    {
      "cell_type": "raw",
      "id": "1957f5cb",
      "metadata": {
        "vscode": {
          "languageId": "raw"
        }
      },
      "source": [
        "---\n",
        "sidebar_label: MongoDB Atlas\n",
        "sidebar_class_name: node-only\n",
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ef1f0986",
      "metadata": {},
      "source": [
        "# MongoDB Atlas\n",
        "\n",
        "```{=mdx}\n",
        ":::tip Compatibility\n",
        "Only available on Node.js.\n",
        "\n",
        "You can still create API routes that use MongoDB with Next.js by setting the `runtime` variable to `nodejs` like so:\n",
        "\n",
        "`export const runtime = \"nodejs\";`\n",
        "\n",
        "You can read more about Edge runtimes in the Next.js documentation [here](https://nextjs.org/docs/app/building-your-application/rendering/edge-and-nodejs-runtimes).\n",
        ":::\n",
        "```\n",
        "\n",
        "This guide provides a quick overview for getting started with MongoDB Atlas [vector stores](/docs/concepts/#vectorstores). For detailed documentation of all `MongoDBAtlasVectorSearch` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_mongodb.MongoDBAtlasVectorSearch.html)."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "c824838d",
      "metadata": {},
      "source": [
        "## Overview\n",
        "\n",
        "### Integration details\n",
        "\n",
        "| Class | Package | [PY support](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas/) |  Package latest |\n",
        "| :--- | :--- | :---: | :---: |\n",
        "| [`MongoDBAtlasVectorSearch`](https://api.js.langchain.com/classes/langchain_mongodb.MongoDBAtlasVectorSearch.html) | [`@langchain/mongodb`](https://www.npmjs.com/package/@langchain/mongodb) | ✅ | ![NPM - Version](https://img.shields.io/npm/v/@langchain/mongodb?style=flat-square&label=%20&) |"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "36fdc060",
      "metadata": {},
      "source": [
        "## Setup\n",
        "\n",
        "To use MongoDB Atlas vector stores, you'll need to configure a MongoDB Atlas cluster and install the `@langchain/mongodb` integration package.\n",
        "\n",
        "### Initial Cluster Configuration\n",
        "\n",
        "To create a MongoDB Atlas cluster, navigate to the [MongoDB Atlas website](https://www.mongodb.com/products/platform/atlas-database) and create an account if you don't already have one.\n",
        "\n",
        "Create and name a cluster when prompted, then find it under `Database`. Select `Browse Collections` and create either a blank collection or one from the provided sample data.\n",
        "\n",
        "**Note:** The cluster created must be MongoDB 7.0 or higher.\n",
        "\n",
        "### Creating an Index\n",
        "\n",
        "After configuring your cluster, you'll need to create an index on the collection field you want to search over.\n",
        "\n",
        "Switch to the `Atlas Search` tab and click `Create Search Index`. From there, make sure you select `Atlas Vector Search - JSON Editor`, then select the appropriate database and collection and paste the following into the textbox:\n",
        "\n",
        "```json\n",
        "{\n",
        "  \"fields\": [\n",
        "    {\n",
        "      \"numDimensions\": 1536,\n",
        "      \"path\": \"embedding\",\n",
        "      \"similarity\": \"euclidean\",\n",
        "      \"type\": \"vector\"\n",
        "    }\n",
        "  ]\n",
        "}\n",
        "```\n",
        "\n",
        "Note that the dimensions property should match the dimensionality of the embeddings you are using. For example, Cohere embeddings have 1024 dimensions, and by default OpenAI embeddings have 1536:\n",
        "\n",
        "Note: By default the vector store expects an index name of default, an indexed collection field name of embedding, and a raw text field name of text. You should initialize the vector store with field names matching your index name collection schema as shown below.\n",
        "\n",
        "Finally, proceed to build the index.\n",
        "\n",
        "### Embeddings\n",
        "\n",
        "This guide will also use [OpenAI embeddings](/docs/integrations/text_embedding/openai), which require you to install the `@langchain/openai` integration package. You can also use [other supported embeddings models](/docs/integrations/text_embedding) if you wish.\n",
        "\n",
        "### Installation\n",
        "\n",
        "Install the following packages:\n",
        "\n",
        "```{=mdx}\n",
        "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
        "import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
        "\n",
        "<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
        "\n",
        "<Npm2Yarn>\n",
        "  @langchain/mongodb mongodb @langchain/openai @langchain/core\n",
        "</Npm2Yarn>\n",
        "```\n",
        "\n",
        "### Credentials\n",
        "\n",
        "Once you've done the above, set the `MONGODB_ATLAS_URI` environment variable from the `Connect` button in Mongo's dashboard. You'll also need your DB name and collection name:\n",
        "\n",
        "```typescript\n",
        "process.env.MONGODB_ATLAS_URI = \"your-atlas-url\";\n",
        "process.env.MONGODB_ATLAS_COLLECTION_NAME = \"your-atlas-db-name\";\n",
        "process.env.MONGODB_ATLAS_DB_NAME = \"your-atlas-db-name\";\n",
        "```\n",
        "\n",
        "If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:\n",
        "\n",
        "```typescript\n",
        "process.env.OPENAI_API_KEY = \"YOUR_API_KEY\";\n",
        "```\n",
        "\n",
        "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n",
        "\n",
        "```typescript\n",
        "// process.env.LANGSMITH_TRACING=\"true\"\n",
        "// process.env.LANGSMITH_API_KEY=\"your-api-key\"\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "93df377e",
      "metadata": {},
      "source": [
        "## Instantiation\n",
        "\n",
        "Once you've set up your cluster as shown above, you can initialize your vector store as follows:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
      "metadata": {
        "tags": []
      },
      "outputs": [],
      "source": [
        "import { MongoDBAtlasVectorSearch } from \"@langchain/mongodb\";\n",
        "import { OpenAIEmbeddings } from \"@langchain/openai\";\n",
        "import { MongoClient } from \"mongodb\";\n",
        "\n",
        "const client = new MongoClient(process.env.MONGODB_ATLAS_URI || \"\");\n",
        "const collection = client.db(process.env.MONGODB_ATLAS_DB_NAME)\n",
        "  .collection(process.env.MONGODB_ATLAS_COLLECTION_NAME);\n",
        "\n",
        "const embeddings = new OpenAIEmbeddings({\n",
        "  model: \"text-embedding-3-small\",\n",
        "});\n",
        "\n",
        "const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {\n",
        "  collection: collection,\n",
        "  indexName: \"vector_index\", // The name of the Atlas search index. Defaults to \"default\"\n",
        "  textKey: \"text\", // The name of the collection field containing the raw content. Defaults to \"text\"\n",
        "  embeddingKey: \"embedding\", // The name of the collection field containing the embedded text. Defaults to \"embedding\"\n",
        "});"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ac6071d4",
      "metadata": {},
      "source": [
        "## Manage vector store\n",
        "\n",
        "### Add items to vector store\n",
        "\n",
        "You can now add documents to your vector store:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "id": "17f5efc0",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[ '1', '2', '3', '4' ]\n"
          ]
        }
      ],
      "source": [
        "import type { Document } from \"@langchain/core/documents\";\n",
        "\n",
        "const document1: Document = {\n",
        "  pageContent: \"The powerhouse of the cell is the mitochondria\",\n",
        "  metadata: { source: \"https://example.com\" }\n",
        "};\n",
        "\n",
        "const document2: Document = {\n",
        "  pageContent: \"Buildings are made out of brick\",\n",
        "  metadata: { source: \"https://example.com\" }\n",
        "};\n",
        "\n",
        "const document3: Document = {\n",
        "  pageContent: \"Mitochondria are made out of lipids\",\n",
        "  metadata: { source: \"https://example.com\" }\n",
        "};\n",
        "\n",
        "const document4: Document = {\n",
        "  pageContent: \"The 2024 Olympics are in Paris\",\n",
        "  metadata: { source: \"https://example.com\" }\n",
        "}\n",
        "\n",
        "const documents = [document1, document2, document3, document4];\n",
        "\n",
        "await vectorStore.addDocuments(documents, { ids: [\"1\", \"2\", \"3\", \"4\"] });"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "dcf1b905",
      "metadata": {},
      "source": [
        "**Note:** After adding documents, there is a slight delay before they become queryable.\n",
        "\n",
        "Adding a document with the same `id` as an existing document will update the existing one.\n",
        "\n",
        "### Delete items from vector store"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "id": "ef61e188",
      "metadata": {},
      "outputs": [],
      "source": [
        "await vectorStore.delete({ ids: [\"4\"] });"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "c3620501",
      "metadata": {},
      "source": [
        "## Query vector store\n",
        "\n",
        "Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
        "\n",
        "### Query directly\n",
        "\n",
        "Performing a simple similarity search can be done as follows:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "id": "aa0a16fa",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "* The powerhouse of the cell is the mitochondria [{\"_id\":\"1\",\"source\":\"https://example.com\"}]\n",
            "* Mitochondria are made out of lipids [{\"_id\":\"3\",\"source\":\"https://example.com\"}]\n"
          ]
        }
      ],
      "source": [
        "const similaritySearchResults = await vectorStore.similaritySearch(\"biology\", 2);\n",
        "\n",
        "for (const doc of similaritySearchResults) {\n",
        "  console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "3ed9d733",
      "metadata": {},
      "source": [
        "### Filtering\n",
        "\n",
        "MongoDB Atlas supports pre-filtering of results on other fields. They require you to define which metadata fields you plan to filter on by updating the index you created initially. Here's an example:\n",
        "\n",
        "```json\n",
        "{\n",
        "  \"fields\": [\n",
        "    {\n",
        "      \"numDimensions\": 1024,\n",
        "      \"path\": \"embedding\",\n",
        "      \"similarity\": \"euclidean\",\n",
        "      \"type\": \"vector\"\n",
        "    },\n",
        "    {\n",
        "      \"path\": \"source\",\n",
        "      \"type\": \"filter\"\n",
        "    }\n",
        "  ]\n",
        "}\n",
        "```\n",
        "\n",
        "Above, the first item in `fields` is the vector index, and the second item is the metadata property you want to filter on. The name of the property is the value of the `path` key. So the above index would allow us to search on a metadata field named `source`.\n",
        "\n",
        "Then, in your code you can use [MQL Query Operators](https://www.mongodb.com/docs/manual/reference/operator/query/) for filtering.\n",
        "\n",
        "The below example illustrates this:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "id": "bc8f242e",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "* The powerhouse of the cell is the mitochondria [{\"_id\":\"1\",\"source\":\"https://example.com\"}]\n",
            "* Mitochondria are made out of lipids [{\"_id\":\"3\",\"source\":\"https://example.com\"}]\n"
          ]
        }
      ],
      "source": [
        "const filter = {\n",
        "  preFilter: {\n",
        "    source: {\n",
        "      $eq: \"https://example.com\",\n",
        "    },\n",
        "  },\n",
        "}\n",
        "\n",
        "const filteredResults = await vectorStore.similaritySearch(\"biology\", 2, filter);\n",
        "\n",
        "for (const doc of filteredResults) {\n",
        "  console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "69326bba",
      "metadata": {},
      "source": [
        "### Returning scores\n",
        "\n",
        "If you want to execute a similarity search and receive the corresponding scores you can run:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "id": "5efd2eaa",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "* [SIM=0.374] The powerhouse of the cell is the mitochondria [{\"_id\":\"1\",\"source\":\"https://example.com\"}]\n",
            "* [SIM=0.370] Mitochondria are made out of lipids [{\"_id\":\"3\",\"source\":\"https://example.com\"}]\n"
          ]
        }
      ],
      "source": [
        "const similaritySearchWithScoreResults = await vectorStore.similaritySearchWithScore(\"biology\", 2, filter)\n",
        "\n",
        "for (const [doc, score] of similaritySearchWithScoreResults) {\n",
        "  console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(doc.metadata)}]`);\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0c235cdc",
      "metadata": {},
      "source": [
        "### Query by turning into retriever\n",
        "\n",
        "You can also transform the vector store into a [retriever](/docs/concepts/retrievers) for easier usage in your chains. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "id": "f3460093",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[\n",
            "  Document {\n",
            "    pageContent: 'The powerhouse of the cell is the mitochondria',\n",
            "    metadata: { _id: '1', source: 'https://example.com' },\n",
            "    id: undefined\n",
            "  },\n",
            "  Document {\n",
            "    pageContent: 'Mitochondria are made out of lipids',\n",
            "    metadata: { _id: '3', source: 'https://example.com' },\n",
            "    id: undefined\n",
            "  }\n",
            "]\n"
          ]
        }
      ],
      "source": [
        "const retriever = vectorStore.asRetriever({\n",
        "  // Optional filter\n",
        "  filter: filter,\n",
        "  k: 2,\n",
        "});\n",
        "await retriever.invoke(\"biology\");"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e2e0a211",
      "metadata": {},
      "source": [
        "### Usage for retrieval-augmented generation\n",
        "\n",
        "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
        "\n",
        "- [Tutorials: working with external knowledge](/docs/tutorials/#working-with-external-knowledge).\n",
        "- [How-to: Question and answer with RAG](/docs/how_to/#qa-with-rag)\n",
        "- [Retrieval conceptual docs](/docs/concepts/retrieval)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "069f1b5f",
      "metadata": {},
      "source": [
        "## Closing connections\n",
        "\n",
        "Make sure you close the client instance when you are finished to avoid excessive resource consumption:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "id": "f71ce986",
      "metadata": {},
      "outputs": [],
      "source": [
        "await client.close();"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "8a27244f",
      "metadata": {},
      "source": [
        "## API reference\n",
        "\n",
        "For detailed documentation of all `MongoDBAtlasVectorSearch` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_mongodb.MongoDBAtlasVectorSearch.html)."
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "TypeScript",
      "language": "typescript",
      "name": "tslab"
    },
    "language_info": {
      "codemirror_mode": {
        "mode": "typescript",
        "name": "javascript",
        "typescript": true
      },
      "file_extension": ".ts",
      "mimetype": "text/typescript",
      "name": "typescript",
      "version": "3.7.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}